EchoPro Transect Subset Workflow#

Import libraries and configure the Jupyter notebook#

# libraries used in the Notebook
import matplotlib.pyplot as plt
import math 
import numpy as np 

# Python version of EchoPro
import EchoPro

# Allows us to grab the SemiVariogram class so we can use its models
from EchoPro.computation import SemiVariogram as SV

# obtain all visualization routines
from EchoPro.visualization import plot_layered_points, plot_kriging_results

# Allows us to easily use matplotlib widgets in our Notebook
# %matplotlib widget

Set up EchoPro for a specific survey year#

Initialize EchoPro object using configuration files#

  • initialization_config.yml – parameters independent of survey year

  • survey_year_2019_config.yml – parameters specific to survey year

  • source – Define the region of data to use e.g. US, CAN, US & CAN

  • exclude_age1 – States whether age 1 hake should be included in analysis.

%%time
survey_2019 = EchoPro.Survey(init_file_path='../config_files/initialization_config.yml',
                             survey_year_file_path='../config_files/survey_year_2019_config.yml',
                             source=3, 
                             exclude_age1=True)
A full check of the initialization file contents needs to be done!
A check of the survey year file contents needs to be done!
CPU times: user 4.76 ms, sys: 303 µs, total: 5.06 ms
Wall time: 5.17 ms

Load and process input data#

  • This data is stored in survey_2019

%%time 
survey_2019.load_survey_data()
CPU times: user 1.37 s, sys: 0 ns, total: 1.37 s
Wall time: 1.37 s
survey_2019.nasc_df.head()
vessel_log_start vessel_log_end latitude longitude stratum_num transect_spacing NASC haul_num
transect_num
1 744.016009 744.491145 34.397267 -121.143005 1 10.0 0.0 0
1 744.500276 744.995605 34.397391 -121.133196 1 10.0 0.0 0
1 745.004125 745.499447 34.397435 -121.123057 1 10.0 0.0 0
1 745.508199 745.994306 34.397394 -121.112871 1 10.0 0.0 0
1 746.003214 746.495701 34.397437 -121.102888 1 10.0 0.0 0

Select a subset of the available transects to analyze#

# obtain all unique transects in nasc_df
unique_transects = survey_2019.nasc_df.index.unique().values
# set the percentage of transects that should be removed
removal_percentage = 50.0

# determine the number of transects that should be selected
num_sel_transects = math.floor(len(unique_transects) * (1.0 - removal_percentage / 100.0))

# initialize the random number generator object and fix the seed
rng = np.random.default_rng(seed=1234)

# randomly select transects without replacement
selected_transects = list(rng.choice(unique_transects, num_sel_transects, replace=False))

Compute the areal biomass density on subset of transects#

  • The areal biomass density is stored in survey_2019.bio_calc.transect_results_gdf as biomass_density_adult

%%time
survey_2019.compute_transect_results(selected_transects=selected_transects)
CPU times: user 1.45 s, sys: 0 ns, total: 1.45 s
Wall time: 1.45 s
survey_2019.bio_calc.transect_results_gdf.head()
latitude longitude stratum_num transect_spacing geometry numerical_density numerical_density_adult biomass_density biomass_density_adult interval ... biomass_age_bin_14 biomass_age_bin_15 biomass_age_bin_16 biomass_age_bin_17 biomass_age_bin_18 biomass_age_bin_19 biomass_age_bin_20 biomass_age_bin_21 biomass_age_bin_22 NASC_adult
transect_num
5 35.065000 -120.726321 1 9.99037 POINT (-120.72632 35.06500) 0.0 0.0 0.0 0.0 0.393227 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 35.064833 -120.734500 1 9.99037 POINT (-120.73450 35.06483) 0.0 0.0 0.0 0.0 0.504411 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 35.064667 -120.744833 1 9.99037 POINT (-120.74483 35.06467) 0.0 0.0 0.0 0.0 0.497993 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 35.064333 -120.754879 1 9.99037 POINT (-120.75488 35.06433) 0.0 0.0 0.0 0.0 0.503068 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 35.063833 -120.765167 1 9.99037 POINT (-120.76517 35.06383) 0.0 0.0 0.0 0.0 0.495629 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 38 columns

print(f"Total Biomass Estimate without Kriging: {1e-6*survey_2019.bio_calc.transect_results_gdf.biomass_adult.sum():.3f} kmt")
Total Biomass Estimate without Kriging: 1438.671 kmt

Note:

After the biomass density has been calculated using the selected transects, all steps that we have previously ran can be completed. However, it is important to note that it is suggested that one computes the semi-variogram parameters using the full data set, rather than a subset of transects. To remind the user of this, a warning will be raised if the user chooses to run the semi-variogram routine (we will demonstrate this later).

Jolly-Hampton CV Analysis#

  • Compute the mean of the Jolly-Hampton CV value on data that has not been Kriged

  • Note: the algorithm used to compute this value is random in nature

%%time
CV_JH_mean = survey_2019.run_cv_analysis(kriged_data=False)
CPU times: user 1.27 s, sys: 10.7 ms, total: 1.28 s
Wall time: 1.28 s
print(f"Mean Jolly-Hampton CV: {CV_JH_mean:.4f}")
Mean Jolly-Hampton CV: 0.2195

Obtain Kriging Mesh Data#

Access Kriging mesh object#

  • Reads mesh data files specified by survey_2019

krig_mesh = survey_2019.get_kriging_mesh()

Plot the Mesh, Transects and smoothed isobath contour#

  • Generate interactive map using the Folium package

  • Mesh points are in gray

  • Transect points are represented by a changing color gradient

  • Smoothed contour points (200m isobath) are in blue

fmap = plot_layered_points(krig_mesh, plot_mesh_points=False)
fmap
Make this Notebook Trusted to load map: File -> Trust Notebook

Apply coordinate transformations#

  • Longitude transformation

  • Lat/Lon to distance

Transect points#

krig_mesh.apply_coordinate_transformation(coord_type='transect')

Mesh points#

krig_mesh.apply_coordinate_transformation(coord_type='mesh')
# plot the transformed mesh points 
plt.plot(krig_mesh.transformed_mesh_df.x_mesh, 
         krig_mesh.transformed_mesh_df.y_mesh, 'r*', markersize=1.25)
plt.show()
../_images/transect_selection_workflow_30_0.png

Try to initialize the Semi-Variogram#

semi_vario = survey_2019.get_semi_variogram(
    krig_mesh,
    params=dict(nlag=30, lag_res=0.002)
)
/usr/mayorgadat/workmain/acoustics/gh/uw-echospace/EchoPro/EchoPro/survey.py:499: UserWarning: The biomass data being used is a subset of the full dataset. It is recommended that you use the biomass data created from the full dataset. To silence this warning set the warning argument to False.
  warn(

As expected, a warning pops up and reminds us not to run the semi-variogram calculation and model fitting using the data that was generated from a subset of the full data set.

Perform Ordinary Kriging of areal biomass density#

  • transformed mesh points

  • semi-variogram model

  • areal biomass density

Initialize Kriging routine#

kriging_params = dict(
    # kriging parameters
    k_max=10,
    k_min=3,
    R=0.0226287,
    ratio=0.001,
    
    # parameters for semi-variogram model
    s_v_params={'nugget': 0.0, 'sill': 0.95279, 'ls': 0.0075429,
                'exp_pow': 1.5, 'ls_hole_eff': 0.0},
    
    # grab appropriate semi-variogram model
    s_v_model=SV.generalized_exp_bessel
)

# initalize kriging routine
krig = survey_2019.get_kriging(kriging_params)

Perform Kriging#

  • Also generates total biomass at mesh points

%%time
krig.run_biomass_kriging(krig_mesh)
CPU times: user 3.34 s, sys: 1.58 s, total: 4.92 s
Wall time: 2.58 s
krig_results = survey_2019.bio_calc.kriging_results_gdf

Convert from kg to kmt

print(f"Total Kriged Biomass Estimate: {1e-6*krig_results.biomass_adult.sum():.3f} kmt")
Total Kriged Biomass Estimate: 1321.333 kmt

Jolly-Hampton CV Analysis for Kriged data#

  • Compute the mean of the Jolly-Hampton CV value on data that has not been Kriged

  • Note: the algorithm used to compute this value is random in nature

CV_JH_mean_kriged = survey_2019.run_cv_analysis(kriged_data=True)
print(f"Mean Jolly-Hampton CV for data with Kriging: {CV_JH_mean_kriged:.4f}")
Mean Jolly-Hampton CV for data with Kriging: 0.1363

Plot Kriged Biomass estimate in kmt#

# plot mesh points with biomass values > 0
krig_results.biomass_adult = 1e-6 * krig_results.biomass_adult
plot_kriging_results(krig_results, krig_field_name="biomass_adult", greater_than_0=True)
Make this Notebook Trusted to load map: File -> Trust Notebook